Total Rank Distance And Scaled Total Rank Distance: Two Alternative Metrics In Computational Linguistics

نویسندگان

  • Anca Dinu
  • Liviu P. Dinu
چکیده

In this paper we propose two metrics to be used in various fields of computational linguistics area. Our construction is based on the supposition that in most of the natural languages the most important information is carried by the first part of the unit. We introduce total rank distance and scaled total rank distance, we prove that they are metrics and investigate their max and expected values. Finally, a short application is presented: we investigate the similarity of Romance languages by computing the scaled total rank distance between the digram rankings of each language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel three-stage distance-based consensus ranking method

In this study, we propose a three-stage weighted sum method for identifying the group ranks of alternatives. In the first stage, a rank matrix, similar to the cross-efficiency matrix, is obtained by computing the individual rank position of each alternative based on importance weights. In the second stage, a secondary goal is defined to limit the vector of weights since the vector of weights ob...

متن کامل

On the Behavior of Certain Metric on the Permutations Group

When a new metric is introduced, often there is a "hidden variable" in the similarity relation (frequently depends on the specific area of research), so that we should always speak of similarity with respect to some property, and there is a plethora of measures in part because researchers are often inexplicit on this point. On the other hand, one should have some knowledge about the nature of t...

متن کامل

Meta Search Engine using Multi-Objective Partial Rank Aggregation: Application in Ranking WebPages

Although there are hundreds of search engines no single search engine can satisfy all web users and can be considered broadly acceptable that Sufficiently comprehensive in its coverage of the web moreover they consist the “spam pages” when a web page gets an undeservedly high rank. Therefore, a robust technique for Meta Search Engine is required that can effectively combat “spam pages”, a serio...

متن کامل

RankEval: Open Tool for Evaluation of Machine-Learned Ranking

Recent research and applications for evaluation and quality estimation of Machine Translation require statistical measures for comparing machine-predicted ranking against gold sets annotated by humans. Additional to the existing practice of measuring segment-level correlation with Kendall tau, we propose using ranking metrics from the research field of Information Retrieval such as Mean Recipro...

متن کامل

Skew and linearized Reed-Solomon codes and maximum sum rank distance codes over any division ring

Reed-Solomon codes and Gabidulin codes have maximum Hamming distance and maximum rank distance, respectively. A general construction using skew polynomials, called skew Reed-Solomon codes, has already been introduced in the literature. In this work, we introduce a linearized version of such codes, called linearized Reed-Solomon codes. We prove that they have maximum sum-rank distance. Such dist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006